Introduction

“The simple graph has brought more information to the data analyst’s mind than any other device.”
— John Tukey

  • Data visualization is the creation and study of the visual representation of data.
  • Many tools for visualizing data (R is one of them)
  • Many approaches/systems within R for making data visualizations, ggplot2 is one of them

ggplot2 \(\in\) tidyverse

ggplot2 \(\in\) tidyverse

  • ggplot2: tidyverse’s data visualization package
  • gg in “ggplot2” stands for Grammar of Graphics
  • Inspired by the book Grammar of Graphics by Leland Wilkinson
  • A grammar of graphics is a tool that enables concise description of components of a graphic


ggplot2 \(\in\) tidyverse


ggplot2 \(\in\) tidyverse


Dataset

Stanford Open Policing Project

Police Searches Drop Dramatically in States that Legalized Marijuana

  • Police Stop Data
    • state, driver race, stop rate, marijuana legalization status
stops <- read_csv("./data/opp-search-marijuana_state.csv") %>% 
  filter(state %in% c("WA", "CO")) %>% 
  mutate(legalization_status = ifelse(quarter <= "2013-01-01", "pre","post"),
         search_rate_100 = search_rate * 100) 

Basic ggplot2 syntax

  • DATA
  • MAPPING
  • GEOM

Your turn!

Exercise: Determine which variable is mapped to which aesthetic (x-axis, y-axis, etc.) element of the dataset.


class: center, middle

Step-by-step


ggplot(data = stops)


ggplot(data = stops, mapping = aes(x = quarter, y = search_rate_100))


ggplot(data = stops, mapping = aes(x = quarter, y = search_rate_100)) +
  geom_point()


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_point()


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(method = "loess")


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(method = "loess", se = FALSE)


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(method = "loess", se = FALSE) +
  scale_color_viridis_d()


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(method = "loess", se = FALSE) +
  scale_color_viridis_d() +
  theme_minimal()


ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(method = "loess", se = FALSE) +
  scale_color_viridis_d() +
  theme_minimal() +
  labs(x = "Year", y = "Search Rate", color = "Driver Race",
       title = "Washington Highway Patrol Searches", subtitle = "Searches Per Hundred stops")


ggplot, the making of

  1. “Initialize” a plot with ggplot()
  2. Add layers with geom_ functions
ggplot(data = <DATA>) +
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))+
  geom_point(mapping = aes(x = displ, y = hwy))

Mapping

Size data points by a numerical variable

ggplot(data = stops, aes(x = quarter, y = search_rate_100, size = search_rate_100)) +
  geom_point()


Set alpha value

ggplot(data = stops, aes(x = quarter, y = search_rate_100, size = search_rate_100)) +
  geom_point(alpha = 0.5)


Your turn!

Exercise: Using information from https://ggplot2.tidyverse.org/articles/ggplot2-specs.html add color, size, alpha, and shape aesthetics to your graph. Experiment. Do different things happen when you map aesthetics to discrete and continuous variables? What happens when you use more than one aesthetic?

stops %>% ggplot(aes(x = quarter , y = search_rate_100, color = driver_race)) + 
  geom_point() + 
  theme_minimal(base_size = 12) +
  labs(title = "Washington") + ## scale_fill for 2d objects like bars, scale_color for lines
  #scale_color_brewer(type = qual, palette = "Dark2") 
  theme(legend.title = element_blank()) + scale_x_date(date_breaks = "1 year", date_labels = "%Y")  


Mappings can be at the geom level

ggplot(data = stops) +
  geom_point(mapping = aes(x = quarter, y = search_rate_100))


Different mappings for different geoms

ggplot(data = stops, mapping = aes(x = quarter, y = search_rate_100)) +
  geom_point() +
  geom_smooth(aes(color = driver_race), method = "loess", se = FALSE)


Set vs. map

  • To map an aesthetic to a variable, place it inside aes()
ggplot(data = stops, 
  mapping = aes(x = quarter, 
                y = search_rate_100,
            color = driver_race)) +
  geom_point() 


  • To set an aesthetic to a value, place it outside aes()
ggplot(data = stops, 
  mapping = aes(x = quarter, 
                y = search_rate_100)) +
  geom_point(color = "red") 

ggplot(data = stops, 
  mapping = aes(x = quarter, 
                y = search_rate_100)) + 
  geom_point(color = "#63B3E8") 


Data can be passed in

stops %>%
  ggplot(aes(x = quarter, y = search_rate_100)) +
    geom_point()


Parameters can be unnamed

ggplot(stops, aes(x = quarter, y = search_rate_100)) +
  geom_point()

Common early pitfalls

Mappings that aren’t

ggplot(data = stops) +
  geom_point(aes(x = quarter, y = search_rate_100, color = "blue"))

Your turn!

Exercise: What is wrong with the following?

stops %>%
  ggplot(aes(x = quarter, y = search_rate_100, color = legalization_status)) %>%
    geom_point()

+ and %>%

What is wrong with the following?

stops %>%
  ggplot(aes(x = quarter, y = search_rate_100, color = legalization_status)) %>%
    geom_point()
## Error: `mapping` must be created by `aes()`
## Did you use %>% instead of +?

Basic plot

ggplot(data = stops, aes(x = quarter, y = search_rate_100)) +
  geom_point() 


Two layers

ggplot(data = stops, aes(x = quarter, y = search_rate_100)) +
  geom_point()  +
  geom_line()

The power of groups

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() + 
  geom_line()


Now we’ve got it

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_smooth(span = 0.2, se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'


Control data by layer

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = filter(stops, search_rate_100 < .2),
             size = 5, color = "gray") +
  geom_point()


Your turn!

Exercise: Work with your neighbor to sketch what the following plots will look like. No cheating! Do not run the code, just think through the code for the time being.

pre_legalization_high <- stops %>%
  filter((quarter < "2013-01-01" & search_rate_100 > 1.0))
ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = pre_legalization_high, size = 5, color = "gray") +
  geom_point() +
  geom_text(data = pre_legalization_high, aes(y = search_rate_100, label = search_rate_100), 
            size = 2, color = "black")

ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point()


ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  geom_point(data = pre_legalization_high, size = 5, color = "gray")


ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = pre_legalization_high, size = 5, color = "gray") +
  geom_point()


ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = pre_legalization_high, size = 5, color = "gray") +
  geom_point() +
  geom_text(data = pre_legalization_high, aes(y = search_rate_100, label = search_rate_100), 
            size = 2, color = "black")


ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = pre_legalization_high, size = 5, color = "gray") +
  geom_point() +
  geom_text(data = pre_legalization_high, aes(y = search_rate_100 + .05, label = search_rate_100), 
            size = 2, color = "black")


ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = pre_legalization_high, size = 5, color = "gray") +
  geom_point() + 
  geom_text_repel(data = pre_legalization_high, 
                  aes(x = quarter, y = search_rate_100, 
                      label = as.character(quarter)), 
                  size = 3, color = "black")


ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point(data = pre_legalization_high, size = 5, color = "gray") +
  geom_point() + 
  geom_label_repel(data = pre_legalization_high, 
                  aes(x = quarter, y = search_rate_100, 
                      label = as.character(quarter)), 
                  size = 3, color = "black")


Your turn!

Exercise: How would you fix the following plot?

ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_smooth(color = "blue")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'


Specifying colors

ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  scale_color_manual(values = c("#FF6EB4", "#00BFFF", "#008B8B")) + 
  geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Splitting over facets

ggplot(data = stops, aes(x = quarter, y = search_rate_100)) +
  geom_line() +
  facet_wrap(state ~ driver_race)


facet_grid

ggplot(data = stops, aes(x = quarter, y = search_rate_100)) +
  geom_line() +
  facet_grid(state ~ driver_race)


facet_grid

ggplot(data = stops, aes(x = quarter, y = search_rate_100)) +
  geom_line() +
  facet_grid(driver_race ~ state)


Scales and legends


Scale transformation

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  scale_y_reverse()


Scale transformation

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  scale_y_sqrt()


Scale details

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  scale_y_continuous(breaks = c(0, 0.25, 0.5, .75, 1.0))

Themes

Overall themes

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  theme_bw()


Overall themes

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  theme_dark() 


Customizing theme elements

ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) +
  geom_point() +
  theme(axis.text.x = element_text(angle = 90))


Combining several plots to a grid

wa_stops <- stops %>% filter(state == "WA") %>% 
  ggplot(aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(se = FALSE) + 
  labs(title = "Washington")

co_stops <- stops %>% filter(state == "CO") %>% 
  ggplot(aes(x = quarter, y = search_rate_100, color = driver_race)) + 
  geom_smooth(se = FALSE) + 
  labs(title = "Colorado") + 
  theme(legend.position = "none")

Combining several plots to a grid

wa_stops + co_stops
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

(wa_stops / co_stops)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Interactivity

plotly::ggplotly(wa_stops)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Your turn!


Final Exercise:

Recreate this chart

**** Bonus: Add Colorado to the chart using the Patchwork library **** Play with themes and adjust titles, subtitles, captions, etc.

Themes Vignette

To really master themes:

ggplot2.tidyverse.org/articles/extending-ggplot2.html#creating-your-own-theme


class: center, middle

Recap


The basics

  • map variables to aethestics
  • add “geoms” for visual representation layers
  • scales can be independently managed
  • legends are automatically created
  • statistics are sometimes calculated by geoms

ggplot2 template

Make any plot by filling in the parameters of this template

knitr::include_graphics("./img/ggplot2-template.png")


Learn more